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\JoQfi CHERJE, LAURENS en ROWJN 




PREFACE 



This Lecture Note deals with as3nnptotic properties, i.e. weak and strong 
consistency and as3m^totic normality, of parameter estimators of nonlinear 
regression models and nonlinear struct\n?al equations under various assumptions 
on the distribution of the data. The estimation methods involved are nonlinear 
least squares estimation (NLLSE), nonlinear robust M~estimation (NLRME) and non- 
linear weighted robust M-estimation (NLWRME) for the regression case and nonlinear 
two-stage least squares estimation (NL2SLSE) and a new method called minimum 
information estimation (MIE) for the case of structural equations. 

The asymptotic properties of the NLLSE and the two robust M-estimation methods 
are derived from further elaborations of results of Jennrich. Special attention 
is payed to the comparison of the as3nnptotic efficiency of NLLSE and NLRME. 

It is shown that if the tails of the error distribution are fatter than those of 
the normal distribution NLRME is more efficient than NLLSE. The NLWRME method is 
appropriate if the distributions of both the errors and the regressors have fat 
tails . 

This study also improves and extends the NL2SLSE theory of Amemiya. The method 
involved is a variant of the instrumental variables method, requiring at least 
as many instrumental variables as parameters to be estimated. The new MIE method 
requires less instrumental variables. Asymptotic normality can be derived by 
employing only one instrumental variable and consistency can even be proved with- 
out using any instrumental variables at all. 

The asymptotic results are not only derived under the assumption that the 
observations are independently distributed but also for the case where some 
explanatory variables are lagged dependent variables. 

The last chapter deals with some empirical applications of the NLRME method and 
the MIE method. 

This study is a slightly revised version of my Ph.D. dissertation submitted 
to the Faculty of Economics of the University of Amsterdam. I would like to 
express hqt thanks to professor Christopher A. Sims of the University of Minnesota 
for his willingness to act as Ph. D. supervisor. His helpful suggestions and 
remarks benefitted the final results and stimulated me to go on with the lonely 
job of writing this book. Professor J. Th. Runnenburg guided me on the slippery 
path of probability theory. Being a self-taught man in this matter, I am much 
indebted to him for his criticism on previous drafts and for showing me how to 
do the mathematics properly. Of course, only I am. responsible for any remaining 
errors or sloppiness. I also wish to express my thanks to my teacher professor 
J.S. Cramer who read the manuscript and offered valuable suggestions. 




VI 



Preliminary versions of some parts of this book have been disseminated as 
working papers. I acknowledge the comments of professor Robert I. Jennrich who made 
we aware of the literature on robust M-estimation and professor Benoit Mandelbrot 
who suggested some additional references. 

I have written this book during the time I was a research fellow of the 
Foundation for Economic Research of the University of Amsterdam. The Foundation 
provided me with time, computer facilities and secretarial help. I especially 
express gratitude to Mrs. Tjiam Hoa Nio for typing the various drafts. 

April 1981 Herman J. Bierens 
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1 INTRODUCTION 

1.1 Specification and mis specification of the econometric model 

Econometrics can he characterized as the science which is dealing with 
obtaining information from economic data, using mathematical-statistical methods. 
This description covers theoretical econometrics which is concerned with the 
question what the best way is of obtaining this information, as well as empirical 
econometrics which is concerned with doing the actual job. Application of 
mathematical-statistical methods implies the use of econometric models. An 
econometric model is [Cramer (1969, p.3-4)] "a set of hypotheses that permits 
statistical inference from the particular data under view”. These hypotheses 
are borrowed partly from economic theory. However, [Cramer (1969, p.2)3” un- 
fortunately economic theorists set great store by generality, and their models 
are therefore as a rule insufficiently specific to permit an empirical application. 
As a consequence, virtually all econometric studies add specific hypotheses of 
their own which are appropiate to the particular situation under review. 

These convenient approximations are dictated by the requirements of statistical 
estimation; they are based on common sense rather than on abstract economic 
theory”. Of course not all economic theory is badly specified, especially 
consumption theory where the form of the demand function is derived by maximizing 
utility under budget constrains. But even then the utility function has to be 
specified. 

Even when the functional form of the model is supplied by economic theory, 
the econometrician has to make assumptions about the stochastic nature of the 

model because economic theory is usually deterministic .while deterministic 

*) 

economic models never fit the data perfectly. Thus a disturbance term is added 
to the model and usually the assumption is made that this disturbance term has 
zero expectation and finite variance or even that it is normally distributed. 

The assumption of normality of the disturbance term is usually based on the 
argument that this disturbance term represents the total impact of a large 
number of variables not considered in the model, and each having only a small 
impact itself. These small impacts are considered as independent random drawings 
from a distribution with finite first and second moments and then, refering to 
the central limit theorem, normality is postulated. However, why should economic 



*) Which will also be called ’’error term" or shortly ’’error”. 

*♦) Following the Dutch convention, random variables, - vectors and - functions 
are underlined, to distinguish them from values they may take. 
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vapiables have finite moments at all? It is well known that the distribution of 
the highest incomes can be described by the Pareto law, that is: if ^*^^is a 
random drawing from the population of incomes larger than , then 

£ y)= l-(^ )“ , a > 0 provided that y^ is large enough. This distribution 
has a finite second moment only if a > 2, while in empirical research a*s are 
found which are smaller than 2. Also other economic variables may have dis- 
tributions without finite second moment. Mandelbrot (1963 a,b, 1967) and Faina(1963, 
1965) have found empirical as well as theoretical evidence for such distributions, 
especially with respect to speculative prices. But if some economic variables 
have infinite second moments, then the same may be the case for disturbance 
terms. Not recognizing this may have serious consequences for least squares 
estimators. For, if Xj” ~ j * 3~1»2,..., where the non- stochastic 

explanatory variables and the are random-drawings from the Cauchy (0,a) 

distribution [n.B.: the density of this Cauchy distribution is f(u)= ^ — ^"1 

0 + u 

then the least-squares estimator b^, say, of g is also Cauchy distributed: 



b = 
— n 



rn vn 

) . -x.y. ) ._-x.u. 

^3=1 3^^3 ^3-1 3-3 

= 3 + 



z.j=r 



2 

X. 

3 



rn 2 
^3 = 1 3 



^1=1 ' 1 ' 

Cauchy ( 3 ,o . ) 



yn X? 



Moreover if c^ = lim ^2 ~ *^^j=l^j exist and are 



positive then b converges in distribution to Cauchy (3 9 *?. — ^), so that the 
— n c^ 

least squares estimator b fails to be consistent. In the case that the u.*s 
^ — n —j 

are i.i.d. N(o,a^) we obviously have plim ^"B 9 provided that as n->“. 



We have just mentioned, referring to the work of Mandelbrot and Fama, that 
it is not necessary that distributions of economic variables have finite 
variances. The argument of these authors is partly based on a theorem from 

probability theory [see Feller (1966) chapter XVII], which states that if a 
vn 

partial sum i.i.d. random variables has a non-degenerate limiting 

distribution G, say, (which means that there are numbers such that 

lim P(a (y?_.u.- b )<x)=G(x) for all continuity points x of G), then this 
n r, - 



distribution function G belongs to the class of stable distributions. These 
stable distributions can be characterized by some parameters, of which the 
most important one is the so-called characteristic exponent a€(0,2]. 



Among these distributions only the normal (the case a=2) has a finite variance. 
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Another example of a stable distribution is the Cauchy distribution (the case a=l)j 
but for other members of this class with characteristic exponent a€(0,l) or 
a£(l,2) no closed form of the distribution function is available yet, even in 
the symmetric case where the density of a S3nnmetric stable distribution with 
characteristic exponent a can only be xnritten as 



cos(tu)e 



dt , Y > 0, a€(0,2] 



where y is a scaling parameter. 

Thus the usual argument for underpinning the normality assumption of 
disturbance terms does not necessarily lead to the conclusion that disturbance 
terms are normal, but only that a stable distribution is appropriate. Of course, 
the above argument does not imply that error distributions are necessarily stable, 
because the impacts of the variables not taken into account may fail to be 
independent or equally distributed. We are only making a plea to break the 
automatism of assuming normality of the error distribution, to recognize the 
possibility that error distributions may have (nearly) infinite variances and to 
use estimation methods being able to cope with such situations. In connection 
with this we note that deviations from normality of the errors of a regression 
model may cause the occurence of outliers of the residuals and that the performance 
of the least squares estimation method is rather sensitive to outliers. 

Specification of the distribution of the disturbance term is only one side 
of the specification problem. Often economic theory is vague about the functional 
form of a relationship and in such cases usually the linear regression model is 
assumed. But misspecification of the functional form may also have serious consequences 

for the conclusions to be drawn. For example if the relationship between 

. 2 

and X IS where the pairs (x. ,u. ) are independent random drawings 

3 3 3 "^3 “”3 

from the bivariate normal distribution N_f(^),(^x ^2)1 and if the model is 

2L 0 0 J 

specified as then the least squares estimator 

vn 1 rn 3 1 ^n 

r - " 

" 3 - 1-3 

converges in probability to zero because 

Pli® ^ Md plim x?=a^^. 



n ^j=i- 



Thus we would conclude (when n is sufficiently large) that 3=0 and hence that 
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the theory involved is false. Of course, this conclusion is correct as far as 
the econometric hypotheses are concerned. But usually econometricians then also 
reject the economic theory involved. 

In the case that the functional form of the model is not supplied by economic 
theory, it is very difficult to avoid misspecification of this functional form 
because often the number of theoretically possible forms is inf inite . From a 
practical point of view the best way seems be to specify the functional forms as 
flexibly as possible, for example by using the well known Box-Cox transformation 

^^(x)= -2L- — , X > 0 

[Box-Cox (1964)] so that then the functional form is partly determined by the 
data itself. Instead of specifying a linear model ^ 

j 1-1 1 i,j 

vk 

log-linear model log(y_j )- 2^:;^3£log(x£ j ^ + Ej then put the model in the 
form: 

+ Hj. 3=1.2 

where now ,X^, . . . ,3^, — are the parameters to be estimated. 

The Box-Cox transformation is applied by Zarembka (1968), White (1972) and 
Spitzer (1976) to determine the functional form of the demand function of 
money, while Leech (1975) has used it to determine whether the disturbance 
term of a CES-production function is additive or multiplicative. In all these 
cases the maximum likelihood method is used, assuming normality of the error 
distribution, and h 3 ^otheses with respect to X are tested by the likelihood 
ratio test [see Goldfeld and Quandt (1972)]. But in view of our previous 
argument about the error distribution, the normality assumption may be too 
strong, hence the optimal properties of the maximum likelihood method may not 
apply at all. 

We can specify our functional form always as flexible as we wish, for example 
by using polynominal transformations , but then we also have to introduce more 
parameters. Obviously the number of parameters that can be handled is limited 
by the number of observations, which is usually small in econometrics. So in 
practice linear specification will often be unavoidable. Therefore we shall deal 
with classes of nonlinear models that also contain the linear case. 
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1.2 The purpose and scope of this study 

From the previous consideration the conclusion can be drawn that there is 
a need for estimation methods which are appropriate for nonlinear models as 
well as robust. Malinvaud (1970, p.296) states: "One describes as robust those 
statistical procedures, which show little sensitivity to the assumptions on the 
stochastic variables of the model for which they are conceived". The main purpose 
of this study is to provide such procedxires and to study their as 3 rmptotic 
properties, but we shall also pay attention to the most important existing 
nonlinear eatimation methods, namely nonlinear least squares and nonlinear 
two-stage least squares. 

Although we shall consider error distributions belonging to a more general 
class than only the normal, it will be necessary to impose some restrictions 
on this class . Except in the cases of nonlinear least squares and two-stage 
least squares estimation, we shall mainly consider S3rmmetric error distributions, 
and dealing with regression models with additive errors we shall moreover assume 
that the error distribution is unimodal. Since symmetric stable distributions 
are unimodal -[see theorem 2.5.^ in section 2.5 below and the remark of Doob 
in the appendix of Gnedenko and Kolmogorov (1954)], the S 3 nnmetric unimodal case 
covers the S}rmmetric stable case. 

For studying the as 3 nnptotic properties of nonlinear estimators we need much 
more probability theory than in the linear case and even some contributions to 
probability theory itself have to be made, especially concerning convergence 
of random functions. This preliminary mathematical theory is provided by chapter 2. 
In chapter 3 we consider the problem of estimating the parameters of nonlinear 
regression models under various assumptions about the distribution of the errors 
and the regressors. The asymptotic properties of three t 5 rpes of estimators are 
studied, namely nonlinear least squares estimators, robust M-estimators and 
weighted robust M-estimations. In chapter 4 we shall deal with estimation of a 
single nonlinear implicit structural equation. We first consider the as 3 miptotic 
properties of the nonlinear two- stage least squares estimator , and then we 
propose a new estimator which requires less instrumental variables than the 
former method. In these latter two chapters it is throughout assumed that the 
observations are independently distributed, but in chapter 5 this assumption is 
dropped. Instead of independence, a new stability condition for nonlinear auto- 
regressive stochastic processes is introduced. Under this stability condition 
the convergence in probability and convergence in distribution results of the 
chapters 3 and 4 carry over. Finally, in chapter 6 we present some applications. 
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2 PRELIMINARY MATHEMATICS 

For understanding linear econometrics, a good background in calculus, 
statistics and linear algebra may be sufficient, but for nonlinear econometrics 
we need additional knowledge of abstract probability theory. Since this book 
is mainly written for econometricians and not for mathematicians, it is presumed 
that the reader is not completely familiar with the measure- theoretical approach 
of probability theory. We shall therefore review and explain this additional 
mathematics in the sections 2.1 and 2.2, in order to make this study (nearly) 
self-contained. Most of the material can be found in textbooks like Chung (1974) 
and Feller (1966), but the theorems 2.2.15 through 2.2.17 are our own elaborations 
of known results. 

Section 2.3 deals with uniform convergence of random functions on compact 
spaces. Especially the theorems 2.3.3 and 2.3.4, which are further elaborations 
of results of Jennrich (1969), are very important for us. Section 2.4 gives a 
brief review of some results on characteristic functions and stable distributions. 
Moreover, we state there the famous central limit theorem of Liapounov. Finally, 
section 2.5 is devoted to properties of (s 3 nranetric) unimodal distributions. 



2.1 Random variables, independence, Borel measurable functions and mathemati- 
cal expectation 

2.1.1 Measure theoretical foundation of probability theory 

In this section we shall give a brief outline of the measiire -theoretical 
foundation of probability theory. Dealing with convergence of random variables 
and uniform convergence of random functions, measure ^theoretical arguments are 
unavoidable. These convergence concepts will play a key role in this study. 

The basic concept of probability theory is the probability space . This is a 
triple {Q,?,P} consisting of: 

1. an abstract non empty set ^2, called the sample space . We do not impose any 
conditions on this set. 

2. a non empty collection? of subsets of Q, having the fallowing two properties: 

- if E€9, then (2.1.1) 

c 

where E denotes the complement of the subset E with respect to 0: 

E^ 

- if E.€T for j=l,2,..., then U E.€?. (2.1.2) 

^ j=l ^ 

These two properties make by definition, a Borelfield of subsets of Q. 
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3. a probability measure P on This is a real valued set function on ‘I' 



such that: 








- p(n)=i 




(2.1. 


.3) 


- P(E)^0 for all E€f 




(2.1. 


.4) 


- E.€^ for j=l,2,... and E.nE. = 0 if j 7 ^ j imply- 
J i ^2 


00 

P(U 

j= 




• 






(2.1. 


5) 



In words: The probability of any union of disjoint sets in ^ equals the 
sum of the probabilities of each of these sets. 



Now let X be a random variable and let F be its distribution function. In the 
measure-theoretical approach of probability theory a random variable is 
considered as a real valued function on the set Q denoted by: 

X = x(.) 

with value x('U)) at such that for every real number t: 

: x(o))_< t}€i? . 

The distribution function F with value F(t) at t€R is then defined by 
F(t) = P({u)€Q : x(uj)£t}), 

which will often be denoted by the short -hand notation: 

F(t) = P(x <t). 

It is not hard to see that the axioms (2.1.1) and (2.1.2) imply: 

E. € « n E. e5^ 

3 j=l ] 

and that from (2.1.3), (2.1.4) and (2.1.5) follow: 

P(0) = 0 

P(E°) = 1 - P(E) 

P(E U F) + P(E n F) = P(E) + P(F) 

E c: F =» P(E) £ P(F^E)+P(E)= P(F) 

^n" ^ntl’ ^ ^n- "^^n> " 

n=l 

E„^E^.,,E=n E^-> P(E^) ^ P(E) 

n n+1 n n 

K.u^ E.)< p(E.) , 

where all sets involved are members of F. Moreover, the distribution 

function F(t) is right continuous : F(t) = lim F(t+e),as is easily verified, 

e+0 

and it satisfies 

F(oo) = lim F(t) = 1, F(-oo) = lim F(t) = 0. 

*) We recall that random variables,- vectors and - functic^s are underlined. 
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Furthermore 5 by F(t-) we denote 

F(t-) = lim F(t-e) , 
e+0 

which clearly satisfies F(t-) _< F(t). 



A finite dimensional random vector can now be defined as a vector with 
random variables as components, where these random components are assumed to 
be defined on a common probability space. 

A complex random variable, + iX ^ complex valued functions 

z=zi . )= x( . ) + iy( . ) 

on with real valued random variables x and x> i*e. 

{uen : x(u)<_t} ef , {men : y(ai) 1 t}e'f 
for any real number t. 



Next we shall construct a Borelfield (B of subsets of such that for every 
set BES and any k-dimensional random vector x on a probability space {Q,S^P} 
we have 



{ii£Q : x(o))€B}€y^, 

because only for such subsets B of R we can define the probability 



( 2 . 1 . 6 ) 



(2.1.7) 



P(xO) = P({a)£Q : x(a))6B}) , 
since probability is only defined for members of the Borelfield 5^. 

Let he the collection ef subsets of R^ of the type ^ 

be the Borelfield of all siibsets of R^. 

Clearly we haveCc^, that means that E.€^ =>EE^. But next to ^ there may be other 
Borelfields of subsets of R^ with this property, say^^,a € A, where A is an 

index set. Assuming that all Borelfields containing £ are represented this way, 

we then have a non-empty collection of Borelfields ^ A 9 of subsets of R^ 

such that 4 for any a € A. Now consider the collection H- ^ . Since each 

a€A^a 

is a Borelfield, it follows that this collection S is a. Borelfield of subsets 
of R^ and since C is contained in each it follows that Ecz S. We shall say that 
the Borelfield B is the minimal feorelfield containing the collection £ , and for 
this particular collection £ it is called the Euclidean Borelfield . Summarizing: 



Definition 2.1.1 . Let t be any collection of subsets of a set V and let the 
Borelfields of subsets of V containing 6 be^^,a € A. 

Then ^ f ^ called the minimal Borelfield containing ^ . 
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Definition 2.1.2 . Let ^ be the collection of subsets of R of the t 5 rpe 

X ^(-“,t ], t €R. The minimal Borelfield 0 containing this collection 
ra=l m m 

is called the Euclidean Borelfield and the members of ^are called the 
Borel sets. 



We show that for any Borel set B and any r.v. x on , (2.1.6) is 

satisfied. Let S) be the collection of all Borel sets B such that (2.1.6) 
is satisfied. Then 0 <=■ S • If <0 is also a Borelfield then (and 

hence 0 )be€iause {S is the minimal Borelfield containing the sets 
while obviously the collection 0 contains these sets. So it suffices to 
prove that 0 is a Borelfield. But this is not hard to do and therefore left to 
the reader. 

Theorem 2.1.1. For any random variable or vector x on {0,5”,?} and any Borel 
set B we have { : x( u) ) € B}€ 

Consequently the definition (2.1.7) is meaningful for Borel sets. In fact, by 
defining a measure y on the Euclidean Borelfield IB as 
y(B) = P(x€B) = P({ men : x( m) eB» 

for any Borel set B we have created a probability measure on & . This probability 
measure y is often refered to as the probability measure induced by (the random 
variable or -vector) x* 

We are now able to define a (joint) distribution function on R^. Let x be 

k ~ 

a random vector in R defined on a probability space {Qj'FjP}. The product sets 

X (-“jt.] are Borel sets, where the t. ’s are the components of a vector t€R^. 
j = l ^ 3 

Thus : 

k — 

{m€n : x(m)£ X (-“,t.]}ey. 
j=i ^ 

The (joint) distribution function F, say, of x is now defined for all t€R^ by: 

F(t)= P({a£n : x(u)€X 

3=1 3 ’ 

which, however, will also often be denoted by the shorthand notation: 



F(t)= P(x<t). 
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2.1.2 Independence 

Now let 2£^92£2’**’ ^ sequence of random variables with corresponding 

probability spaces ’^2^ ’ ‘ it is possible 

to construct a new probability space, {Q,9^,P}, say, such that the 2 £j’s can be 
regarded as independent random variables on {Q,^,P} [see Chung (1974, section 3.3)] . 
Independence means: 






Definition 2.1.3 . Let 2ii92i2’*'* random vectors in R ,R respectively 

This sequence is called independent if for any finite sequence (B.) 

^ k. ^ 

of Borel sets (with B . a Borel set in R ^ ) 



m in 

P(n {x.€B.})=n P(x.€B.) forallm>l. 

j=l-3 3 -3 D 



( 2 . 1 . 8 ) 



and (Xj)^_^ is called mutually ( or pairwise) independent if for j^. 



and X- are independent. 



2.1.3 Borel measurable functions 

If X is a r.v. and f(x) is a real function on R^, is then f(x) a r.v.? 

The answer is: not always. There are functions see for example Royden 
(1968, problem 3.28)] for which this is not the case. The condition for 

f(x) being a r.v. is that for all t € R^ we have {a£^2 : f(x(o)))£ t} €^, 
where {Q,?",P} is the probability space involved, and refering to theorem 
2.1.1 this will be the case if for every t€R the set {x€R : f(x) <_t} is 
a Borel set. Functions satisfying the latter condition are called Borel 
measurable . Now consider a real function f(x^,...x^) on R and r.v.’s 
x^,..., 5 ^ on {Q,9^,P}. If for every t€R^ the set {(Xj^s — ,x^)€R^: f (x^, . . . ,x^)<t} 
is a Borel set B_^, say, in R^ then {wGQ : f (x^(m) , . . . ,x^(o)) ) f_t}€9^ for every 
t€R^ and hence f(x^,...,>^) is a r.v. Also such functions are called Borel 
measurable : 



Definition 2.1.4 . A real function f(x^, . . . , x^^) on R is called 
Borel measurable if for every t € R^ the set 

{ (x^, . . . , x^) € R^ : f(x^, . . . , £ t} is a Borel set in R^ . 



A first example of a Borel measurable function is the so called 
simple function : 
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k 

Definition 2.1.^ . A real function f(x^, •••9 on R is called a 

simple function if there are finite real numbers b^, . . . , b^ and 

Borel sets B.,j =l,2,...,n with B.; H B. = 0 if j , such that 

3 '^1 ^2 



f(x^, .. 






... =<,,) 


where 




'1 if (x^, . 


... xj,)e Bj 


X.(x^, • 


' 0 if (x^, , 


Bj 



k 

If we realize that the set {x£R :f(x)<t} for a simple function f is always a 
finite union of Borel sets, we have: 

Theorem 2.1.2 . Simple functions are Borel measurable. 

From this result we can derive other Borel measurable functions using the 
following theorem. 

Theorem 2.1.3 . Let f^, f^, ... be a sequence of Borel measurable functions 
k 

on R . Then the functions sup {f-, ..., f } , inf {f^, ..., f }, 

’ n ’ 1 ’ ’ n 

sup f , inf f , limsup f and liminf f are also Borel measurable. 

^n’ n n^n n n 

n n 

Proof: We only consider the case k=l. Moreover, it is not hard to see 
that if f is Borel measurable then -f is Borel measurable, hence it 
suffices to prove the theorem for the "sup"“cases . Let 
h^(x)= sup {f^(x) ..,f^(x)}. Then 

1 ^1 
{x € R : h ('X)< t} = fl {x € R"^ : f . (x) < t} , 

^ - j=i ^ - 

which is a Borel set since the are Borel measurable. Moreover, replacing 

n with we see that sup f (x) is Borel measurable. Since 
^ n 
n 

limsup f (x) = inf sup f, (x) and since inf g (x) is Borel measurable if each 
n ^ n 1 ^ k n ^n 

n k^n 

g^ is, it follows directly that limsup is Borel measurable. □ 

Remark . From this theorem we can also conclude that if x^ , . . . , 5 ^ are 

random variables, so are sup{x^, . . . ,x^} , inf{x^, . . . supx^, inf 5 ^, 

limsup X and liminf x . 
n ^—n n -ii 

From the theorems 2.1.2 and 2.1.3 it follows: 

• k 

Theorem 2.1.4 . A continuous real function on R is Borel measurable. 
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Proof : Let f(x) be a continuous real function on R . It is not hard to show that 

the functions f (x) defined by 
n 



f (x) 
n 



f(x) if |x| _< n 
.0 if |x| > n 



n = 1 , 2 , ... 



can, for fixed n, be written as limits of simple functions, so that by theorem 

2.1.3 the f (x)’s are Borel measurable. Since 
n 

f(x) = lim f (x) = limsup f (x) = liminf f (x) 
n n n n n n 

it follows from theorem 2.1.3 that f is Borel measurable. □ 

Let f be any Borel measurable function on R . Since the functions 
max { 0 ,x}and max {0,-x}are continuous and hence Borel measurable, it 
follows that 

f'^(x^,...,Xj^) = max{Q,f(x^,...,Xj^)},f(Xj^,...,Xj^)= max{ Q,-f(x^, . . . ,x^)} . 



are non-negative Borel measurable functions. Moreover we obviously have 

Xj^)= (2.1.9) 

This representation is important because it means that we can often focus our 
attention on non-negative Borel measurable functions without loss of generality. 
Thus the following theorem gives a full characterization of Borel measurable 
functions : 

Theorem 2.1.5 . A non-negative real function f on R is Borel measurable 

Ic 

if and only if there is a non-decreasing sequence of simple functions on R 
satisfying 0 £ ...9 •••> 

yr 

lim • • -X]^) = . . . , Xj^) for each (x^, . . . , Xj^) € R . 

n>oo 



Proof: Take for given non-negative Borel measurable f 
• m-l m -1 



ri!L± if 

^ -n 



< f(x, 



“2° 



n otherwise 



for integers m with l_^<n 2^. Then the have all the required properties. 

Since by theorem 2.1.2 the simple functions are Borel measurable, the limit 
is Borel measurable by theorem 2.1.3. o 




13 



2.1.4 Mathematical expectation 

The above theorem can be used for defining the mathematical expectation 
of f(x^,..., 2Ej^)s where f is a Borel measurable function on and 
x ^5 • • • 9 r.v.*s on a probability space {Qs^P}, as some limit of 

mathematical expectations of simple functions. The latter expectations are 
defined as follows: 

k 

Definition 2.1.6 . Let f(x^, ..., Xj^) be the simple function on R as 

defined in definition 2.1.5. and let x^^, ^ he r.v.’sona probability 

space {Qj^P}. Then the mathematical expectation of f(x , . . . , x, ) is 

k 

defined by : 

E f(x^, .... Xj^) = bj P({u)ea: (x^(u),..., Xj^((o))eB.}) 



" Ij=l 

For any non-negative Borel measurable function we define : 

Definition 2.1.7. Let f(x^, ..., Xj^) be a non-negative Borel measurable 
function on R^ and let x ^_9 • • • 9 r.v.’son a common probability space. 

Then: 

E , 2 ^) = sup E , 5^) 

where the supremum is taken over all the simple functions satisfying : 

0 1 'Mxj^, Xj^);^f(x^,...,Xj^) for all (x^, . . . ,Xj^)£r’'. 

Using the representation (2.1.9) we now have: 

Definition 2.1.8. Let f(x^ , ..., x,_) be a Borel measurable function on 
k Ik 

R and let x^, ..., 5^ be r.v. on a common probability space {0,r^,P}. 

If E f’^Cx^, 2^) < 00 and /or e f"(x ^9 •••9 < °o then : 

E f(x^9 •••9 2^) = E f'‘’(2i^, 2^) - E f (22^, ..., 

which is also denoted by the general integral with respect to the measure P 
E f(Xj^, = I f(x^(a)), ..., 5^(o)) )P(do5) = I f(x^5 ..., x^^)dP 






(the last integral is only a short-hand notation of the first one). 
If both E f^(2S2s 2^^ “ °° ^ ^ ^—1’ • • • 9 ~ “ 9 

the mathematical expectation and the corresponding integral are 
undefined. Finally, for any set A £ ^ we define 
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. . . 3 Xj^(cd)P(daJ)=| y(a))P(do>) 



A‘ 

where : 

y(o)) 



f(x.(o)), X, (uj)) if 0) € A 



(^0 ifo)€0\A, 

provided that the latter integral is defined. 

The mathematical expectation is often denoted by the classical Riemann- 
Stieltjes integral [see Rudin (1976)]: 

( 2 . 1 . 10 ) 



E f( 



X^) = 

^ k 










where F is the joint distribution function of the random vector 2£* * 

If the integral (2.1.10) exists for a Borel function f then it equals the 
mathematical expectation because then the function f can be written as a 
pointwise limit of stepfunctions [compare Rudin (1976, chapter 6)], while 
stepf unctions are obviously simple functions. Refering to definition 2.1.7 we 
thus have: 

Theorem 2.1.6 . Let x^, ..., Xj^ be r.v. on a probability space (^,<^P) 

and let F(x^, ..., Xj^) be the joint distribution function of these 

r.v.’ a. If for a Borel measurable function f (x- , . . . ,x ) on R^ the Riemann-Stielj^es 



integral f(x^, Xj^) dF(x^, ... ,x^)is defined then 



E f{x^, .. 






f(x^, . . . , Xj^)dF(xj^, . .. ,Xj^) . 



The general integral with respect to a probability measure has all the 
properties of the classical Riemann-Stielt jes integral with respect to the 
distribution function F, and in the following we shall assume that the reader 
is familiar with its elementary properties . [Otherwise, see for example Chung (1974)]. 
Some of these properties are listed below since they are frequently used in this 
study. These properties are: 

1) Let X and ^ be random variables defined on the probability space {Q,5’,P}. Let 

A, A be sets in?. Let a and 3 be real numbers. We have 
n 

(a) |(ax + 3y)dP=a|xdP + $|ydP 

A A A 

provided that the right side is meaningful. 

(b) If the A ’s are disjoint, then 



xdP=2: 

n^ 



xdP 



UA 
n n 
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(c) If x(uj)^0 for every w€A, then 

jxdP^O 

A 

(d) If x(o))<y(a)) for every wCA, then 

|xdP<|ydP 
A A 

(e) ijxdPl^jlxIdP 

A A 



provided that the left side integral is defined. 

These properties are not hard to verify from the definitions 2.1.7 and 2.1.8. 
See also Kolmogorov and Fomin (1961), chapter VII, for related properties of 
the Lebesgue integral. 



2) If x^, X2» •••> 2^ are independent random variables or -vectors and if 

f. (x),...,f (x) are Borel measurable functions, then 
In 

n n 

E n fix . ) = n E f . ( X . ) 9 n <_ «> , 

j=l ^ j=l ^ ^ 

provided that the right hand expectations are finite. 

3) Chebishev^s inequality . If (j) is a Borel measurable function on such 
that (j)(x) is positive and monotone increasing on (0,“) and 

(()(x) = (J)(--x), then for every r.v. x and every 6 > 0 we have 



E(|> (x) 

P ( |x| > 5)1 . 

(5) 

Holder *s inequality . 

|E x^l 1 E |x^| < 

for p > 1 and i + -i = i. 

P q 

inequality . ) 



Let X and be r.v. . Then 

1 ^ 

{E |xp}P .{E 1x1% “I . 

(If p = 2 we have the well-known Cauchy - Schwarz- 



5) Liapounov*s inequality . Let x be a r.v. Then for 1 ip Iq 1 “9 
1 1 

p p q q 

{e|x 1 } < {e|x 1 } . 

(This follows straightforwardly from Holder’s inequality by putting =1 and 
replacing x with |xp and p with q/p. 

Since assertion 2) is a standard result and the proof of Chebishev’s 
inequality is nearly trivial, we shall only prove Holder’s inequality: 
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Without loss of generality we may assume E|5 cP> 0 and E|y^|^ > 0 because 
otherwise the inequality is nearly trivial. From the convexity of the function 
-log X on(0,“) it follows that for a, b€ (0,“): 

log a + i log b < log{i a + i b) , 

p q - p q 

Then by taking expectations we obtain the inequality involved. 



or equivalently a^ 



1 JL 

b^ < i a + — b . 
-p q 



Substitute a = 



Ex 



Since the mean of a finite number of non-random variables in R can 
be considered as a mathematical expectation, it follows from Holder’s 
inequality that for real numbers x,^ , . . . , x^ , y^ , . . . , y^ : 






_1 

,q}q 



l,iU= 1 

p q 



( 2 . 1 . 11 ) 



and consequently, taking yj=l, 

1 p> 1. 

This last inequality is a sharpening of the following trivial but useful 
inequality : 









p { 

n C max 






r,P Vli 



^3=1' 3 



r 9 p > 0 



(2.1.13) 



Finally we shall also use the following result[see the proof of theorem 2.2.6]. 



Theorem 2.1.7. Let x he a r.v. on P} such that e|x| < oo . 

Let (A ) be a sequence of sets in? such that lim P(A ) = 0. 

“ n 

n->oo 

Then : 



lim 

n-x» 




0 . 



Proof : Let r^={a3£0:n_< [ x(o))<n+l} . 

From the fact that 



I |x(w) 
A 



P(d(i)) = 



x(aj)dP(da)) ( =Ij^_q 



n^=0 



r 

n 



|x(o)) I P(du))<“ 



r 

n 
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we conclude that 



lx(o3) |P(dw)=5^“_^ |x(w) |P(do))->0 



for n:^ 



The theorem follows now from 



P(do)) + 



x(w) P(do)) 



|x(o))|P(du)) = I |x(o))|l 

A nz, 

n k 

|x(a))|P(do)) + I kP(du)) ^ | | x(w) | P(do)) + kP(A^) . 



A OC 
n k 



A nz, 

n k 



2.2. Convergence of random variables and distributions 
2.2.1 Weak and ^^^^rong convergence of random variables 

In this section w^shall deal with three important convergence concepts, 
namely convergence in probability, convergence almost surely and convergence 
in distribution. The first concept is well known: 



Definition 2.2.1. Let be a sequence of r.v.’s.We say that converges 
in probability to a r.v. x if for every e > 0 

lim P(|x -x|<e)=l, 

'--n — 

n^ 

and we write : x x in pr . or plim x = x . 

-n — ^ ^ _ 

However, a much stronger convergence concept is: 



Definition 2.2.2. Let (x^ be a sequence of r .v*s on a common probability space 
{Q,^,P}. We say that x^^ converges almost surely (a.s.) to a r.v. x (defined 
on {Q,'9^,P}) if there is a null set N€7 (that is a set in^J satisfying P(N)=0) 
such that for every o)€fNN: 
lim X (o))= x(o)) , 

n->«o 

and we write : x x a.s. or lim x = x a.s. 

— n — — n — 

n-x» 



A useful criterion for almost sure convergence of random variables is given 
by the following theorem. 
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Theorem 2.2.1. Let x and — be random variables on a common 

probability space {Q,9,P}. Then x a.s. if and only if 

OO I I 

lim P(n ^or every e.>0. (2.2.1) 

n-^ m=n 



Proof: First we prove that x(u)) = lim x (w) pointwise on Q^N implies 
(2.2.1). Let u)q€Q^N. Then for every e>0 there is a number nQ(u)Q,e) 

such that I e ^ ^ nQ(u)Qje ). Now consider the following 

set in 

OO 

^n^^^ “ ^ { w€Q • I " x(u>) e} 

m=n 

OO OO 

Then w € A(e) ^ and hence o)_ £ U A (e). Thus we have Q NC U A (e) 

0 n^Cui^.e) ^0 n n 

and consequently P (Q'^N ) £P ( U A (e)). But P('Q'vN)=P(Q)-P(N) = l since N 

is a nuUsetj hence P(U . A (e)) = 1, 
n=l n 



k 

Since Ajj(e) = (e), we have Aj^ (e) = U A (e) and thus 

k OO 

lim P(A, (c)) = lim P( U A^(e)) = P( U A^(e)) = 1 , 

k->oo k->oo n=l n=l 

which proves the first part of the theorem. 

Next we prove that if lim P(A (e)) = 1 for every e > 0 then there is 
n^ 

a nuU set N such that 

x(o)) = lim x^(u>) pointwise on Q N . 

OO 

For every Wq £ U there exists a index such that 



^ f \ (e) snd thus lx (o)_) - x(a)-)| < e if n > n_(u)^,e) for every 
u 'no O'— — 00 

OO OO 

Un e U A^(e). Put N = 8 ^ U A (e). Then N e and 
° n=l " ^ n=l " ^ 



P(N)=P(Q)- P(U A (e))= 1-1 = 0, hence N is a nuUset. 



n=l ^ 

However, we must find a null set that is independent of e. Consider the 
sequence ^ with k = 1, 2, ... .We have just proved that for every 

n Oia-t- M 

k e,. 



k there is a nuU set Nf = N such that for every w € Q Nf : 

V k 

x^(dj ) “ x( 0 )) I _< if n ^ nQ(o3Q, i ) . 

OO ^ 

But then the same applies for every m€Q\N, where N = U N is a countable union 

k=l ^ 




19 



of null sets in? and hence itself a null set in?. By putting k= [~]+l an 
arbitrarily chosen e > 0 we now see that for all N i 

lx (O))- x(io)|< -i<e if n n (<»„, ) , 

' n '-k- 0 0 

e 

which proves the "only if" part of the theorem . ° 

From this theorem we see that 
a.s. implies 



because n {lx - x|< e}c{|x - x|< e} and consequently P(n 
— m — n — 

m-n 



{jx -- x|<e})<P(|x - x|<e). 
, m — ' — — — n — 



The following simple but important theorem provides another useful criterion 
for almost sure convergence. 

Theorem 2.2.2 . ( Bor el - C ante Hi lemma ) 

Let X and x^, .... be r.v.’s. If for every e > 0, lii^ "" ii “ 

then X X a.s.. 



Proof: Consider the set 
— — — 00 00 

A (e) = n { I x^ " iil = ri { |x^(o)) - x^u))| £ e}, 

m=n m=n 

where {q, 5^P} is the probability space involved. -From theorem 2.2.1 it follows 

that it suffices to show P(A^(e))-^l or equivalently, P(A^( e) )*^) ->' 0 . But 
00 

(A^(e))^ = 1/ {|x^ - xl > e} s and hence P((A^(e))^) ^ ( liSm “ ~l ^ ^ 

m=n 

Since the latter sum is a tailsumof the- convergent series I “ x| ^ wa 

must have Y°°P(|x “xl>e)->0asn-^«» which proves the theorem, o 

^m=n '--n 



The a.s. convergence concept arises in a natural way from the strong laws of 
large numbers. Here we give two versions of these laws. 



Theorem 2.2.3A . If (Xj ) is a sequence of uncorrelated random variables and 
if 

E(Xj” EXj)^=0(j^) for some y<l, 
then 
Irn 



— y’? -(x.- Ex.)->0 a.s. 
n^]=l -] -3 



Proof: This theorem is a further elaboration of the strong law of large numbers 
of Rademacher-Menchov [see Revesz (1968), theorem 3.2.1 or Stout (1974), theorem 
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2.3.2], which states: Let (y ^ ) be a sequence of orthogonal random variables 

(orthogonality means that Ey^. y. =0 if j ^ j ). If 

Dl 32 -L ^ 

Ij-^Clog j)E^j < » 
then 

Ij-q Zj converges a.s. 

(which means that is a.s. a finite valued random variable). 

Now let 



Zj=(2ij- E2ij)/3. 

Then 



Ij-J^dog j)Eyj = 5 ]”_^(log j) 0 (j*^ ^)<oo 



#00 

fory<l. (Because (log u)u^ du= 



(log u)u^ ^dlog u = ve^^ ^^^dv<<»). Consequently 



X.- Ex. 
Y” -3 -1 

Z.4-1 



1 1 
converges a.s. 



"j=l j 

From the Kronecker lemma [see Revesz (1968), theorem 1.2.2 or Chung (1974), p.l23] 
this implies that : 



.(x.- Ex.)-^0 a.s. 
n^3=l -3 -3 



Remark . The condition E(>£j“ O(j^) for some y< 1 holds if 



sup -i y^ . E I X . - Ex . I CO for some 6 > 0 , 
n^n^3=l 3 —3' ’ 



2+6 



2+6 

because then e|x.- Ex.| =0(j), so that by Liapounov’s inequality 

2+ 6 



E(Xj- EXj)^ l{ElXj- Ex^r'^"} =0(j^'^'^). 



If the are independent and equally distributed, the condition on 

the second moments is not needed: 



Theorem 2.2. 3B: ( Strong law of large numbers of Kolmogorov ) 

If (i£j ) is a sequence of independent and equally distributed random 

variables with E 1 x J < 00 , then ~ y? , x. -> E x, a.s. 

'— 1 ' n ^ 3=1 — 3 —1 

Since in this study this theorem is only used in the discussion of the 
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work of Jennrich (1969) in the next chapter, and since it does not play a further 
role in our own contributions to nonlinear estimation, we refer for the 
proof to Chung (1974), theorem 5.4.2. 

We have already mentioned that almost s\ire convergence implies convergence 
in probability. There is also a converse connection, given by the following 
theorem. 



Theorem 2.2.4 . Let x and r.v*s. Then 5^-^ P^* 

only if every subsequence (i^) of the sequence (n) contains a further 
subsequence (n^ ) such that x x a.s. 

j 

Proof : Suppose that every subsequence (n^^) contains a further sxibsequence 
(n, ) such that x x a.s., but that not x ^ x in pr. Then there exist 

-A.- - p 

numbers e >0, 6 >0, and a subsequence (n^^) such that 

1. o) ±1 - 6 , 

~K 

hence for every further subsequence (n^ ) we have the same, which contradicts 

j 

our assumption. Thus the ”if” part is now proved. Next we suppose that 
5^-^ X in pr. Then for every positive integer k 

lim p(|x - x| > -ij-) = 0. 

For each k we can find a n, such that P( j x - x|>4) hence 

^ "A “ 2*' 2^" 



^ 2 
and consequently 



k=l 2 



. con 

yr ^ P(|x - xl > e) < °° for every e > 0. 
^k-0 n. — 



By the Borel-Cantelli lemma it follows now that x x ^ s., whicn proves the 

— 

"only if" part. ° 



Theorem 2.2.5 . Let x and ^ 2 ’ ••••ber.v’s such that 

a) X X a.s. 

—n — 

or 

b) X ^ X in pr.. 

1 

respectively. Let f(x) be a Borel measiirable function on R . If f is con- 
tinuous on a Borel set B such that P (x € B) = 1, then 

a) f(x ) f(x) a.s. 

—n — 

or 

b) f(2i) in pr., 
respectively . 
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Proof : a) Let be the probability space involved. There is a nuU set 

such that for every e > 0 and every o) € Q : 

|x^(w) - x(u) I <_ e if n ^ n^C^jS) . 

Let ^2 = ^ : x(w) 6 B } . Then is a null set in S^and x(w) is 

for every uj € Q v. a continuity point of f . Thus for every e > 0 and every 
0 ) € Q \ there is a number 6( e , w) such that : 

|f(x^(uj)) - f(x(o))) I £ e if |x^(u)) - x(w) I <. 6(e,o)) . 

Hence for every e > 0 and every w £ Q'^(N^ U N^) we have : 

|f(x^(w)) ~ f(x((x)))| £ e if n 6(e,w)). 

Since is a nuU set in.?”, this proves part a) of the theorem. 

b) For an arbitrary subsequence we have a further subsequence (n^^ ) 

such that 2^ X a*s.(see theorem 2.2.4) and consequently by part a) of the theorem; 
f(x ) -> f(x) a.s. By theorem 2.2.4 this implies f(x )-^ f(x) in pr. o 
3 



Remark . Until so far in this section we only dealt with random variables in R^* 
Byt generalisation of the definitions and th6 theorems in this section from 
random variables to finite dimensional random vectors is straightforward simply 
by changing random variable to random vector . 



2.2.2 Convergence of mathematical expectations 



If K and 2i2’ *** random variables or vectors such that for some 

p > 0, E|x^ - xp 0 as n-> 00 , then it follows from Chebishev^s inequality 

that x^ X in pr. The converse is not always true. A partial converse is given 
by the following theorem. 

Theorem 2.2. 6 . If 5^ x if" there is a r.v. ^ satisfying 

liSjjl ~ ^ 3 l , s . for n = 1,2, ... and E ^ < oo for some p > 0, then: 




Proof . If P(| x| > ^ ^ then 5 ^ -> x iri is not possible. Hence \x\ 

Since now |x^ - 2i| X ^ X "tbere is no loss of generality in assuming x “ a.s. . 

We then have : 
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|lx^(w)l P(d(i))= j 



|x^(a))| 



P(doj)+ 



|x^(w)| p(dco)^ e^+ 



P 

y(a))P4i(uX 



{ |x^(a)) |>e} 



{ |x (od)< e} 
n *~ 



{ |x^(o3) |>e} 



The theorem follows now from theorem 2.1.7. □ 

Putting p=l in theorem 2.2.6 we have: 



Theorem 2.2.7 . ( Bounded convergence theorem ) 

If ^ ^ X in pr. and if |x^| ■< £ a.s. , where E ^ then E x^-> E x 



We use this theorem for proving first Fatou’s lemma which in its turn will 
be used for proving the monotone convergence theorem. 



Theorem 2.2.8 . ( Fatou’s lemma ) 

If X > 0 a . s . , then E liminf x < liminf E x . 
— n — —XI — -n 



Proof : Put X = liminf x^ and let (|)(x) be any simple function satisfying 
0 < d)(x) < X. Put y = min (({jCx), x ). Then y (f)(x) in pr. because: 

P ( |min((|)(x) 9 X ) - <j>(x)| > e ) = P (x ^ (|)(x) - e) <P(x ^x-e)-»“0. 

Moreover, since <j>(x) is a simple function we must have E<j)(x)< From 
the bounded convergence theorem and from ^ 2Sjx ^ ® follows now : 

E <j)(x) = linJ ^ Zjx " liminf E _< liminf E x^^. 

Taking the supremum over all such simple functions 4> it follows now from 
definition 2.1.7: 

E X < liminf E x , 

which was to be proved . o 

Theorem 2.2. S . ( Monotone convergence theorem . ) 

Let (x ) be a non-decreasing sequence of r.v.^s. Then E lim x =limEx ^ 

—n — n -~n 

Proof : Since our sequence (2E^) is non-decreasing, we have: 

lim X = liminf x , lim Ex = liminf E x , 

— n “-n — n -n’ 

so that by Fatou*s lemma : 

E lim X < lim E x . 

— n — -n 

But for any n we have x < lim x a. s. because x is non-decreasing, hence E x<E lim x 
j —n — — n — n ^ — iT 

and consequently: 
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lim Ex < E lim x 9 

— Tl — “”11 

which proves the theorem. ° 

2.2.3 Convergence of distributions 

If and r.v^s with distribution functions F, F^, F 2 9...9 

respectively 9 then one would like to say that 5 ^ convergences in distribution 

to X if for every t€R ^9 F (tHF(t). 

^ 1 

However, if x and F are given and if we define ^ 3 then: 

F (t)= P(x < t)= P(x < t F(t 

n "~n — n n ’ 

so that for every discontinuity point t^ of F we have : 

lim F^(tg) = lim F(tg - ^) = F(tp-) < FCtg) , 
while intuitively we should expect that in this case we also have convergence 
in distribution. Moreover, if ~ 2i ^ have ^ 

F(t-n) F(-a>) = 0 for every t. Thus not every sequence of distribution 
functions converges to another distribution function. In the latter case 
we say that the convergence is improper. 



Definition 2.2.3 . A sequence (F^(t)) of distribution functions converges 
properly if F^(t)->F(t) pointwise for all continuity points of F, where F is 
a distribution function. We then write: F^-^ F properly. 

The exclusion of discontinuity points avoids the complication that otherwise 
the function F(t) = lim may not be right continuous. In view of the 

above example we now define: 

Definition 2.2.4 . A sequence (21^^) of random variables (or random vectors) 
converges in distribution to a random variable (or random vector) Xj if 
their underlying distribution functions F^, F, respectively, satisfy 
properly. We then write distr.. 

Remark : If this "limit” distribution F is the distribution function of (for 

2 2 

example) the normal distribution N(y,a ), we shall also write:x ^N(y ,a ) in distr. 

— n 

There is a close connection between proper convergence of distribution 
functions and convergence of mathematical expectations, as is shown by the 
following theorem. This theorem is very fundamental since it allows a variety 
of applications. 
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Theorem 2.2.10. Let F and F , n = 1,2, ... be distribution functions on 
'k 

R . Then F_ -> F properly if and only if for every uniformly bounded 

^ k 

continuous real function 4 on R : 



(j)(t)dF (t) 
n 




cf)(t)dF(t) . 



Proof: Since the proof for the general case k > 1 is a straightforward 
extension of that for k = 1, we assume k = 1. 

Suppose properly. We can always find for given e>0 continuity points a and 

b of F such that F(b)- F(a)>l - e. Let be any uniformly bounded continuous 
real function on R with uniform bound 1 (which is no restriction). By the uniform 
continuity of (j) on [a,b] we can find continuity points t^, t^, of F 

satisfying 



a=t <t < ... < t -<b =t and sui 
1 2 m-1 m 






4»(t) -inf (f)(t) <. e for i=l,2,. . . ,m-l. 



Now define 



i|i(t)=^ 



iSL * for te(t.,t ], 



elsewhere . 



Then 



0<_((i(t) - I(i(t) £ e 

0j< (j>(t) - i|;(t) <_ 1 

hence : 



for t e ( a , b 1 , 
f or t ^ ( a , b ] , 



1 iKt)dF^(t)- 4.(t)dF^(t)| 1 



{te( a , b]} 



edF (t)+ 
n 



dF (t) 
n 



{tg( a,b ]} 



= e(F^(b )- F^(a ))+ 1 ' b )+ ) 

e(F( b )- F( a ))+ 1 - F( b )+ F( a ) 1 2e . 
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Moreover , 



,|,(t)dF(:t)=5;!':iinf 
n ^ 1=1 .w. 












te(ti,t.^i] 



<^(t)}(F(t.^^)-F(t.)) = 



T|;(t)dF(t) 



and I I (|)(t)dF(t)- | \|;(t)dF(t)| _< 2e. So we have: 

I I .)>(t)dF^(t)- I (|)(t)dF(t)| £46 + || iKt)dF^(t)- I iKt)dF(t)|< 



5e 



for sufficiently large n, which proves the ’’only if" part of the theorem. 
Now let u be a continuity point of F and define 
fl if t < u 



*(t) = 



't’1 = 

1 ,m 






|o if t > u , 

1 if t < u - - 
— m 

- m.t + m.u if t € (u “ 

|o if t > u, 

1 if t < u 

- m.t + 1 + m,u if t € (u u + -il 

m-* 

if t > ^ + ~. 



Then ^ 2 m uniformly bounded continuous functions on R 

satisfying (t) < <t>(t) < <\>^ (t), hence for m = 1,2,... and n->«>: 

° l,m “ 2,m 

(t)dF^(t) < F (u)= I cj)(t)dF (t) < I 6 ^ (t)dF (t) 

; Ijm n — n j ^ n — J '2,m n 

j 1 F(u):= j (|.(t)dF(t) <l • 



Moreover 
0 < 



\% 't’l „(t))dF(t)< f dF(t) < F(u+ -)- F(u - i). 

) ^jm l,m - j m m 

-1 ... -Ltt 



{t€(u- — ,u+ i]} 
m ’ m-* 



Since u is a continuity point of F, F (u + “) - F(u- 1) can be 

ro m 

made arbitrarily small by increasing m; hence F^(u) •> F(u), which proves 
the ”if»' part. [ 

A direct consequence of this theorem is that 
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Theorem 2.2.11 . ^ ^ in pr. implies 2^ x distr. , 

because by theorem 2.2.5 it follows that for ary continuous function (J) we have 
that in pr. implies ( x^ ) ( x ) pr., while by theorem 2.2.7, 

<j) (>^ ) -»• (j) ( X ) in pr. implies E(j)(x^) -> E(j)(x) if (|) is a uniformly bounded 
continuous function. 



The converse of this theorem is not generally true, but it is if x is constant 
a.s., that is: P(x=c)=l for some constant c. In that case the proper limit F 
involved is: 



F(t) 



1 if t ^ c 
0 if t < c 



The proof of this proposition is very simple: 



P(|x - cl < e)= P(c-e < X < c+e)= F (c+e ) - F ( (c-e)~ ) 
— n — n n 

F(c+e)- F(c-e)= 1 for every e > 0 , 
since c+e and c-e are continuity points of F. Thus: 



Theorem 2.2.12 . Convergence in distribution to a constant implies 
convergence in probability to that constant. 

Let X and x be random vectors in R such that x ^ x in distr. and let f 
— n — — n — 

k 

be any continuous real function on R . For any uniformly bounded continuous 
real function (J> on R^ it follows that <|)(f) is a uniformly bounded continuous 
real function on R , so that by theorem 2.2.10, E(|)(f(x^)) E(|>(f(x)) and con- 

sequently f(x ) -> f(x) in distr. Thus we have: 



Theorem 2 . 2 . 13 . Let x and x be random vectors in R and let f be a 
-^1 — 

k 

continuous real function on R . Then 5^ -> x in distr. implies 
f(x ) f(x) in distr. . 



A more general result is given by the following theorem. 
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k 

Theorem 2 . 2 . 14 . Let and x be random vectors in R 9 ^ a random 
vector in and c a non-random vector in R^. 

If 5^ X in distr. and c in distr. then for any continuous real 

function f on R^ x C ,where C is some subset of R^ with interior point c , 

f(x jY ) ■> f(x,c) in distr. 

—n -^n — 



Proof : Again we prove the theorem for the case k=m=l since the proof of the 
general case is similar. 

It suffices to prove that for any uniformly bounded continuous real function 
2 

(() on R we have 
because then 

Ei|;(f(>^,y^)) E\|;(f(x9c)) 

for any uniformly bounded continuous real function on R^, which by theorem 
2.2.10 implies f(2^9^) distr. 

Let M be the uniform bound of and let F and F be the distribution functions 

n 

of x^ and X9 respectively. For every e we can choose continuity points a and 
b of F such that 

P(xe(a,bJ)= F(b) - F(a) > 1 - ^ 

Moreover, for any 6 > 0 we have 

{x ^ 

— n 

+ I - 

{>^§!(a,b]} 

< I U(x ,y )- (|)(x ,c)ldP + 2M P({x €(a,b]}D {|y - c|>6}) + 2M P(x ^a,bj)< 

— j ' ^ — n — n ' — n ' ' —n — 

{ 5 ^€(a,b] } ly^^” cl<_ 6} 

Ji| |dP + 2M P{|^^-c|>6} + 2M( 1-F^(b)+F^(a) ) 

{x €(a,b]} ly -c|l6} 
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Since by theorem 2.2.12, c in pr. , we have P{ |y^-c | >6 0 for any 6 > 0 , 
while liHF 2M(1-F^(b)+F^(a) )= 2M(1-F(b)+F(a) )< e . Furthermore, since 

^^^ 1 *^ 2 ^ is uniformly continuous on the bounded set 
2 

{(ti,t 2 )€R :a < t^ b, It^-clj^ 6 }, provided that 6 is so small that this 

set is contained in C, we can choose 6 such that the last written 

integral above is smaller than e . So we conclude : 

E(j>(x ,y )- E(f)(x ,c) -> 0. 

~n ^n ~n 

Since obviously E(j)(x ,c) E<|)(x,c) because x ^ x in distr. , the theorem 

follows . Q 

Convergence of distributions and mathematical expectations 



The condition in theorem 2.2.10 that the function (f> is uniformly bounded 
is a serious limitation for econometric applications. So we shall try to get 
rid of it. But before doing this we introduce some notation: 

Definition 2.2.5 : Let (f> be any function on a Euclidean space. 

By (p(x) ^ we mean the function 

<|>(x)= sup |4>(y)| , ‘ ( 2 . 2 . 1 ) 

{|y|l|xl} 

which obviously is a function of |x|. This notation plays a role in the 
following extension of the ’’only if” part of theorem 2 . 2 . 10 . 



k 

Theorem 2.2.15. Let (F_) be a sequence of distribution functions on R 

^ ] 
satisfying F F properly. Let (|)(x) be a continuous real function on R 



such that 



sup {(j)(x) ^^"*^^dF (x)<“ for 
n J n 



some 6 > 0 . 



Then 



(|)(x)dF^(x) • 



^(x)dF(x). 



( 2 . 2 . 2 ) 



For proving this theorem we need the following lemma. 



Lemma 2.2.1 . Let (J> be a continuous real function on R and let for a^O: 

Tj^(a)= sup |<t)(x) I . 

|x] <_a 

Then i|^(a) is continuous on [0,«>). 



k I I 

Proof : The function (|) is uniformly continuous on the set S={x€R :|x|<_a+l}, a>0. 
Thus for every e >0 there is a 6>0 such that for every x^€S, x^CS satisfying 
[x^- X 2 l <6 we have |(|>(x^)- (|>(x 2 )|<e. Now let x€S and let 0 be a number in (0,1). 
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Then 0x€S and 



|<j)(x)- (j>(6x)l<e if |x- 0xl = (l-0) |x|j<(l-6)(a+l)<6 . 



Thus choose 0 such that 






which is possible for 0 sufficiently close to 1. 

Next let Q<6^<1. We have 

\f>(a+(^) = sup |(^(x)|£sup |<j>(x)-4»(0x)|+ sup |<|>(0x)|< 



sup I fix; |£ sup 
(x|<a+6^ (x|<a+d 



|x| <a+6^ 



sup 

|x| <_0(a+6^; 



I f (x) I <e + f (a) 



if 0(a+6^) Thus if we choose 6^ so small that 



ITT » 



0<jf(b)- f(a)£e for a< b <a + ^. 

This proves that f is left continuous on (0,«). By letting a+0 we see that f 
is also left continuous on [0,“). By a similar argument it can be shown that 
f is right continuous on (0,«). □ 

Proof of theorem 2.2.15 . Since f(x) and f(x)° is a non-decreasing function 

of |x| it follows that lim f(x)°< <» implies that f(x) is uniformly bounded. In that 



case the theorem follows from theorem 2 . 2 . ID . Thus without loss of generality 
we may assume 

f (x) “ as |xl^ (2.2.3) 

Now define for a^O : 

, J<KX) if lx|< a. 



if a 

k 

Then is a uniformly bounded continuous real function on R , hence by 

theorem 2 . 2 . 10 : 



f 

f (x)dF (x) f (x)dF(x)- 
J a n J a 



(2.2.5) 
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Moreover , 



I (j)(x)dF (x) - (|) (x)dF (x) 1 = 

' J n J a n ‘ 



< 2 



(|)(x) dF^(x)=2 
xl>a} { 

2 






4ti}- 



xr“}« 



(|)(x)dF^(x) 
x| >a} 

dF (x) 



(x)dF (x) 
a n 

<1 >a} 



- 0,6 



sup 

infCK^T ^ 

{ |x|>a} 



{^(x)0?^'^'^dF (x) ^ 0 as a , 



( 2 . 2 . 6 ) 



l[(Kx)dF(x) - L (x)dF(x) I <^2 <(i(x) °dF(x). 

J ^ r I I . 1 



because of (2.2.2) and (2.2.3). Similarly we have 

(2.2.7) 

{ lx| >a} 

If (|)(x) °dF(x) converges, then (2.2.7) tends to zero as a -> «>, hence the 
theorem follows from (2.2.5), (2.2.6) and (2.2.7). For showing 



( 2 . 2 . 8 ) 



J(j)(x) dF(x) < «> 

we observe that by lemma 2.2.1, i|^( | x| )=(|>(x) is a continuous function of |x|. 
let 

fil;( Ixl ) if |x| < a 

,jlxl).f' ' (2.2.9) 

^ U(a) if X > a . 

k I 

Then i|/ (|x|) is a uniformly bounded continuous function on R , and for any |x 
a non decreasing function of a > 0. Since obviously ij^^(lx|)< and 

lim i|;(lxl)= ij;(|x|), we have by the monotone convergence theorem and theorem 

a-x» ^ 

2 . 2.10 : 



i|;( |xl )dF(x)= lim i|;( 1 xj )dF(x) = limUim 

a-x» ^ a-^oo n-x» 



< sup fi|i(|x|)dF (x)= supL(x)%F (x) 
n J ^ n J n 



(Ixl)dF (x)} 
a ' ' n 



( 2 . 2 . 10 ) 



Thus (2.2.8) is proved by now and so is the theorem. o 

Along similar lines we can prove the following version of the well-known 
weak law of large numbers: 



Now 
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Theorem 2 . 2 . 16 . Let 2So * * • a sequence of independent random 

k -^ 9^9 

vectors in R , and let (F.(x)) be the sequence of corresponding 

3 k 

distribution functions. Let c|)(x) be a continuous function on R . 

If 

= l Fj(x)^ G(x) properly 



and 



. 1+6 



X rn r — — — 

~£j_2E| (|)(Xj ) I < oo for some 6 > 
then 

plim (|>(Xj)= |4»(x)dG(x). 



Proof: If <|)(x)° remains bounded for |x| then (J>(x) is uniformly bounded. In 
that case the theorem follows from the usual weak law of large numbers and 
theorem 2.2.10. Thus we assume now that 4>(x)^ as |x| 

Consider the function defined in (2.2.4). Then obviously by the 

independence of the X . , the boundedness of(j> (x) and Chebishev^s inequality: 

—3 a 

plim — T’? ,{<(i (x.)- E (fi (x.)}= 0, (2.2.11) 

while from theorem 2.2.10 it follows 

limEiy^ (j) (x.)= f(|) (x)dG(x) . (2.2.12) 

con^3=l a -3 J a 

Hence 

plim — (j> (x.)= (x)dG(x). (2.2.13) 

^ n —1 J a 

Moreover, since is uniformly bounded, it follows from theorem 2.2.6 

that ( 2 . 2 . 13 ) implies 

E|— A (x.) - (f> (x)dG(x)| 0 as n a>. (2.2.14) 

‘ n ^3=1 a — 3 J a ' 

Furthermore, similar to (2.2.6) it follows that 

limsup eI— y?_ (|)(x.) .(f)(x.)|-^0 as a-^“ (2.2.15) 

I n ^3=1^ ~3 n ^j=ra -3 ' 

and similar to (2.2.7) that 

I |(()(x)dG(x) - |(|)^(x)dG(x)| ^ 0 as a , (2.2.16) 
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Combining (2. 2. 14), (2. 2. IS) and ( 2 . 2 . 16 ) we see that 

lim E|-i-y° (Kx.) - |(Kx)dG(x)U 0. (2.2.17) 

The theorem follows now from (2.2.17) and Chebishev’s inequality. ° 

Remark . The difference of this theorem with the classical weak law of large 
numbers is that the finiteness of second moments is not necessary. 

If we combine the theorems 2.2.3k and 2 . 2 . 15 we obtain the 
following strong version of theorem 2.2.16. 



Theorem 2.2.17 . Let the conditions of theorem 2.2.16 be satisfied, 
and assume in addition that 



E{(j)(Xj) }^=o(j^) with y < 1 

Then — V] .(f)(x.) (j)(x)dG(x) a.s. 

n ^ 3 = 1 "^ -3 J 



2 . 3 Uniform convergence of random functions 

2.3.1 Random functions. Uniform strong and weak convergence 

A random function is a function which is a r.v. for each value of its 
argument . Usually random functions occur as a function of both random variables 
and parameters, for example the sum of squares of a regression model. Their 
definition is similar to that of random variables: 

Definition 2.3.1: Let /r ,P} be a probability space and let be a 

k 

subset of R . The real function f^(0)=f (0,0)) on x ^ is called a (real) 

random function on if for every t€R and every 0€ © , 

{o)€^:f(0,o)) t}^f- 



*) Note that E|(f)(Xj) - Ec|)(Xj)|^ E{<j>(Xj )°}^ . The condition e|{J>(x.)-- E(}>(x . ) [ ^=0( j^) 
with y<l would be sufficient here, but condition (2.2.18) will be convenient 
for later purpose. 




34 



However, dealing with random functions one should be aware of some pitfalls. 
First, if ^(6) is a random function on an uncountable subset @ of a 
Euclidean space, then sup f(0) and inf f(6) are not automatically random 
variables, for: ^ 



{o)€Q : inf f (0 ,o))<_ t }=U 

ee0 ee0 



f(0,O3)£t} 



and 



{o)GQ : sup f(0,o))<_t} = n {a)€^2 : f(0,(i))<^t} 

ee0 ee0 

are then uncountable unions and intersections, respectively, of members 
of the Borel fie Id ^ and therefore not necessarily members of ? themselves. 
Another pitfall is that if 0_ is a random vector in an uncountable subset © 
of a Euclidean space and if f(0) is a random function on , then f^(6_) is 
not necessarily a random variable, because 

{ajeR:f(0(u)),o)) £ t) = U [{u)€« :f ( 0„,,a)) ± t) 0 {u)€«:0(a)) = 0^}] 

e*e © 

is an uncountable union of members of f. 

These problems can be overcome if we assume that the random function _f(0) is 
separable [see Gihman and Skorohod (1974), chapter III, section 2]. However, 
in this study we shall only deal with random functions of the type 

f(0) = (|)(0,x), 

where (j) is a continuous real function on © xR™ with @ a subset of and 
X is a random vector in R^, and for this case we do not need the separability 
concept, because of the following easy theorem: [lemma 2 of Jennrich(1969) ] 

Theorem 2.3.1: If (|>(0,x) is a continuous real function on © xr’'^, where 

® k 

is a compact subset of R , then sup 4»(0,x) and inf (J)(0,x) 

ee 0 ee © 

are continuous functions on R . 

Proof: Choose x^ £ r"^ and e > 0 arbitrarily. Since is compact and 
£ = {xeR”: lx - XqI £ 6*} is compact for each 6* > 0 it follows that 
(j)(0,x) is uniformly continuous on the compact set So 6 > 0 can be 

chosen such that for all 0^,0€ and allx^,x£l satisfying 

l(0*,x*)- (0 9 x)l < 6 we have l(j)(0^,x^) - <f>(0,x)| < e, hence for every 0G @ 

and 1 x-Xq 1 < 6 , 



(|)(0,Xq) - el. <f>(0,x) 1 (|)(0 ,Xq) + e. 
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Consequently : 

sup (()(0,x-) - e < sup <(>(e 5 x) < sup (f>(0,x_) + e. 

ee 0 ° ee © ee © ° 

if [x-x 1 < 6. This shows that sup i)i(0,x) is continuous in x^^. By changing 

ee 0 

”sup" in ”inf" we see that also inf (j>(0,x) is continuous in x_. o 

ee © ° 

Since continuous real functions are Borel measurable [see theorem 2.1.4] it 
follows now that for the case under review, sup (j>(0,x) and inf 

ee 0 6€ 0 

are random variables if © is compact . But this holds also if @ xs 
unbounded, provided that 

®n = ® {eeR”^: |e| £n>. 

is compact for each n, because we then have 

sup (|)(0,x) = lim sup 

ee 0 n-^«» 0€ 0^ 

inf (j)(0 5 x) = lim inf (j>(0,x), 

ee 0 n-+“ ee ©^ 

which by theorem 2.1.3 and theorem 2.3.1 are Borel measurable functions. 
Finally, since (|)(0,x) is Borel measurable on 0 x it follows that for 
any pair of random vectors and x in X R^ 9 ^(e_,x) is a random variable. 

Let us return to more general random functions. The properties of a 
random function f^(Q) may differ for different w in For two points and 
u)^ in it is possible for example that f(0,o)^) is continuous and f(0,o)^) is 
discontinuous at the same 0. It is even possible that a random function is not 
defined for each 0 in a set in 7 depending on 0 with probability zero. 

In this study we shall always consider properties of random functions 
holding almost surely, which means that a property of f^(0) = f(0,o)) holds for 
all 0 ) in a set E£?F with P(E) = 1. Thus for example the statement: "£(&) is 
a.s. continuous on ” means that there is a null set N such that f(0,a)) is 
continuous on © for all o)€Q N. 

Next we shall pay attention to convergence of random functions, as here we 
are confronted with a similar problem. Let f_(0) and ^^0) random functions 
on a subset(^ of such, that for each 0C(^ , fj^(e)-^£(0) a.s. as n-x». Then 
at first sight we should expect from definition 2.2.2 that there is a null set 
N and an integer function nQ(w,0,e) such that for every e>0 and every 
a)€Q^N, |f^(0,oj) - f(0,o))|_^ e if n n^(ui,6 ,e) . But reading definition 2.2.2 




36 



carefully we see that this is not correct, because the null set N may depend 
on-0: N=N- . Then again at first sight we may reply that this does not matter 

o 

because we could choose N = U N_ as a null set. But the problem now is that 

ee® cj- 

we are not sure whether N€^, for only countable unions of members of T are 

surely members of T themselves. Thus although N for each 6€0) , this is not 
necessarily the case for vV is uncountable. Moreover, even if U N €T , 

it may fail to be a null set itself if is uncountable. For example, 
let (^=f2=[0,l] , let P be the Lebesgue measure on [0,1] and let Nq={ 0} for 
0€[O,l]. Then P(U N_)=P(Q)=1, while obviously the N.’s are null sets. 

ee0® ® 

As is well known, uniform convergence of (real) nonrandom functions, for 
example (j>^(0) (j>(0) uniformly on as n-^, can be defined as 

sup |(|> (0)-^(6)l “^0 for n 
0€ 0 ^ 

Dealing with uniform a.s. convergence of random functions 

f (0) f(0) a.s. uniformly on © , 

a suitable definition is therefore: 

sup If (0) - f(0)| 0 a.s. as n"^“, 

0€ 0 

or in other words: there is a null set N and an integer function n^(u)ye) such 
that for every e > 0 and every o) € N, 

sup jf (0,0)) - f(0,o))l ± £ ifn^n (o),e). 

06 0 

However, this has only a probabilistic meaning if sup l£. 

06 © ^ 

a random variable for each n. Only if so, we shall say that a.s. 

uniformly on . Nevertheless , if sup (6) “■£(9)1 is not a random 

ee © 

variable but if f^(0,O)) f(0,o)) uniformly on 0 for every u) in ^ except on 

a null set, then we still have a useful property, as will turn out. In this 

case we shall say that £^(9) £.(9) a.s. pseudo- uniformly on 0 . 

Definition 2.3.2: Let f(0) and f (0) be random functions on a subset 
- — " — — n 

0 of a Euclidean space, and let {Si,7*,p} be the probability space 
involved . Then 

(a) f (0) _f(0) a.s. pointwise on 0 if for every 0 € there is 

a null set in^ and for every e > 0 and every a number 

nQ(o),0,e) such that |f^(0,o)) - f(0,o))| £ e if n^n^(u},d,e) 
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^ a.s. uniformly on (^ if 

(i) sup |f(6)-f(0)| is a random variable for n=la2a 

6€0 

and if 

(ii) there is a null set N and for every e>0, and u)£^^N a number 
n^CoDje) such that 

sup ]f ( 0 , 0 ))- f(0,oj)|< e if n>n (o),e). 

0 €® "" " ° 

(c) ^(0)"^ £(0) a*s. pseudo-uniformly on if condition (ii) in (b) 

holds, but not necessarily condition (i). 

Similar as in the case of a.s. uniform convergence of random functions the 
uniform convergence in probability of £^^(0) "to f(0) can be defined by 
plim sup |f (0)- f(0)|=O, provided that sup |f(0)-f(0)| is a random variable 

ee© ee© 

for n=l,2,.«. In that case it follows from theorem 2.2.4 that f (0)-^f(0) in pr. 

— n — 

uniformly on if and only if every subsequence of (n) contains a further 

subsequence (n^^ ) such that f^0)"^_f(0) a.s. uniformly on . This suggests 

j k. 

how to define pseudo-uniform convergence in pr. : 

Definition 2.3.3 : Let fj^(0) and £(0) be random functions on a subset 0 
of a Euclidean space. Then 

a) f (0)->f(0) in pr. uniformly on if sup If (0)- f(0)| 

is a random variable for n=l,2,... satisfying 

plim sup lf(0)" f( 0)1=0 

ee© " 

b) ^(6)" f(6) in pr. pseudo-uniformly in if every subsequence (nj^) 

of (n) contains a further subsequence (n, ) such that f (0)->f(0) a.s. 

K . n 

pseudo-uniformly on ^ 



Remark . In this study we shall often conclude 

sup (0)- ^(0)1 *^0 a.s. 

0€0 ^ 



or 

plim sup If (0)- f^( 0)1=0 

0€0 

instead of 

f (0)->f(0) a.s. uniformly on 



^(0)^_f(0) in pr. uniformly on 0 , 
spectively. In these cases it 
is a random variable for n=l,2. 



respectively. In these cases it will be clear from the context that sup (0)-^(0)l 

060 "" 



We are now able to generalize theorem 2.2.5 for random functions . 
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Theorem 2.3.2 . Let (f^(0)) be a seq[uence of random functions on a Borel 

subset © of a Euclidean space. Let f( 6 ) be an a.s. continuous random 

function on . Let 5 ^ and x be random vectors in such that 

P(x€ )=1 and (E) n=l, 2 ,... . Suppose that f(x) is a 

random variable and that f (x ) is a random variable for n=1.2.... . If: 

■— n -^1 

a) 2^21 s.s. and a.s. pseudo-uniformly on . 

or if: 

b) Xj^ii P^* pseudo-uniformly on (^ , 

then 

a) f (x )->'f(x) a.s. 

■—n —n 

or 

b) f (x )^f(x) in pr.j 

— n — n 

respectively. 



Proof : a) Let (Qj^jP) be the probability space. Let be the null set on which 

fails to hold, let be the null set on which f( 0 ,o)) fails to be 

continuous, let N. and N_ be null sets on which x(oj)€ fH) and x (w)€ (H) , 

3 3,n V-/ n 

respectively, fail to hold and finally let be the null set on which 
sup |f ( 6 , 0 ))- f (0 , 0 )) |‘>‘ 0 fails to hold. Put 

ee© ° 

N = N.U N„U N,U{S, N- }'JN„ 

X 2 3 ri“X 3 jTi 4 

Then NCT, P(N)=0 and for o)€Q'^N we have 

1 fn(Xj^(«^j) f (x(o)) , 0 )) |j< I f^(x^(o)) , 0 ))- f(x^(o)) , 0 )) | + |f(x^(o)) , 0 ))- f (x(o)) ,o)) |£ 



_< sup |f ( 6 , 0 ))- f ( 0 , 0 )) I + 1 f (x (o)),o))- f (x(o)) , 0 )) 0 as n-»«>. 

0€0 ^ ^ 

This proves part a). Part b) follows from a) by using theorem 2.2.4. 



2.3.2 Uniform strong and weak laws of large numbers 

Next we shall extend the theorems 1 and 2 of Jennrich (1969). We shall 
closely follow Jennrich ^s proof, but instead of the Helly - Bray theorem 
(theorem 2.2.10) we shall new use theorem 2.2.17. The extension involved is: 



Theorem 2.3.3. Let x. ,x^,.... be a sequence of independent random vec- 
k . . ^ ^ 

tors in R with distribution functions respectively. 

Let f(x,0) be a continuous real function on R x where is a 

compact siiiset of R^. 

Let i|)(a)= sup sup |f(x, 0 )| 

{ |x| <_ a} 06 (h) 




If: 



® (2.3.1) 

and both: 



|x. |)^*^*^ < « for some 6 > 0, (2.3.2) 

n D-J- “"3 

E(^( Ixj I ))^=0( j^) for some }i<l, (2.3.3) 

then ~ (Xj ,0)-> |f(x,0)dG(x) a.s. uniformly on (^ . 

Proof: From theorem 2.3.1 it follows that sudJ f ( x , 0 ) | is a continuous function 

0 €® 

of X and consequently it follows from lemma 2.2.1 that i/^(a) is a continuous 
function on [0,“). Hence ^( |xj 1 ) is a random variable, and therefore the 
conditions (2.3.2) and (2.3.3) make sense. 

For the sake of convenience and clearity we shall label the main steps of 
the proof. 



Step 1 : 

Choose 6 q arbitrarily in 0 and put r^={eeR“: | 0 - 0^1 <6}n0 , for Si.0. Then 
for any 6^0, sup f(x,0) and inf f(x,0) are continuous functions on , because 

0er^ 0€r^ 

IS a closed subset of a compact set and therefore [see Rudin ( 1976 ), theorem 
2.35] compact itself (compare theorem 2.3.1). Moreover, 

|sup f(x,0) |£ sup |f(x,0) |£ ijj( |x| ) (2.3.4) 

6€r^ ee0 

|inf f(x,0) 1^ sup |f(x,0) |£ i|)( |x| ) . (2.3.5) 

06r^ 66© 

Thus it follows from theorem 2.2.17 and the conditions (2.3.1), (2.3.2) and 
(2.3.3) that 



h K -1 sup f(x. ,0) sup f(x,0)dG(x) a.s. (2.3.6) 

^ 0€r. ^ J0€r^ 

0 0 

and 

r K-i i’^i f(x. ,0)-> inf f(x,0)dG(x) a.s. . (2.3.7) 

^ eer. ^ J0€r^ 

0 6 



Step 2 : 

Similar as in the proof of theorem 2.2.5 it follows from the conditions 
(2.3.1) and (2.3.2) that: 
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^(| 



X )dG(x)«» > 



while by (2.3.4) and (2.3.5) we have for every 6 0 : 

I sup f(x, 0 )“ inf f(x,e)|£ 2 \ji(|x|) . 



eer. 



Therefore 



eer^ 



limn sup f(x,0)dG(x)- inf f (x,0)dG(x) | =0 

640 ^ eer^ 



(2.3.8) 



by bounded convergence. 

Step 3 : 

Choose e>0 arbitrarily. From (2.3.8) it follows that 6>0 can be chosen so 
small , say 6 = 6 ( e ) , that : 



0 _< sup 

J0er^( 



e) 



f (x,0)dG(x)- 



inf f(x,0)dG(x)<2S . 
0€r^(e) 



(2.3.9) 



Let be the probability space involved. From (2.3.6) and (2.3.7) it 

follows that there is a null set N and for each o)€fiNN a number MQ(u),e) such that 

sup f(x. (a)),0)- [sup f(x,0)dG(x)| < Je , 

0er^(e) ^ ^0€r^(e) 

1:“!^--, inf f(x. (m) ,0)- [inf f (x, 0 )dG(x)|£j 0 

0er^(e) J •'0€r^(e) 



(2.3.10) 



(2.3.11) 



if n^nQ(o),e). From (2.3.9), (2.3.10) and (2 . 3 . 11) it follows now that for all 
o)€Q^N, all n^nQ(o),e) and all 



f(x.(w),0)- [f (x,0)dG(x)£ sup f(x.(o)),0)- 

sup f(x.(w),0)- [sup f(x,0)dG(x) 1 + 

+1 sup f(x,0)dG(x)- inf f (x,0)dG(x ) | <0 



0€r 



f (x ,0 )dG(x);^ 



6 ( 0 ) 



^ 6 ( 0 ) 

and similarly: 

f (x. (m) ,0)- [f(x,0)dG(x 
n^3=l 3 J 



)> -e. 



Thus for and n^nQ(o 3 , 0 ) we have: 
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sup f(x.(u)),0)- 



f (x,6)dG(x>|^e . 



We note that the null set N and the number n^C^jS) depend on the set which 

in its turn depends on 0^ and e. Thus the above result should be restated as 
follows. For every 0^ in and every e>0 there is a null set N(0Q,e) and an 

integer function nQ(.,e,0Q) on Q'^N(0 q, e) such that for o3€O'^N(0Q,e) and n^n^Cw ,£,0^) 



sup If 1^-1 f(x.(u)),0)- I f (x,0)dG(x) |£ e, (2.3.12) 

where 6(e) is some positive (real) valued function of e€(0,“) and 

r^ 9 Q)={eeR’^: |e-eQ|^ 6 }n@. (2.3.13) 



Step 4 : 

The collection of sets {0€R^ : | 0-0^ | <5} with 0 0^® is an open covering of 
. Since is compact, there exists by definition of compactness an finite 
covering. Thus there are a finite number of points in ® , say 0g 2.9 • • p 



with “ such that 



© <= U {eeR^: l0 - 0- . |<6}ci U {0€R^[0 - 0. . |< 6}. 

1=1 ' 6,i' ' 6,i‘ 



From (2.3.13) we therefore have : 






(2.3.14) 



Now put: 

’^(e) 

N =U N(e. . , . ,e) 
e 6(e) ,x 



n*(u,e)= max n (ii),e,e., , .). 

Ki<r., , ^ 

6(e) 



Then by (2.3.13) and (2.3.14) we have for a)£Q ^ N and n^n^(a),e) , 



sup l—H.. f(x.(o)),0)- [f(x,0)dG(x) 1 £ 
0€© ^3-13 J 



max sup 

l<i<r., X 0€r», x(0., X .; 

6(e) 6(e) 6(e) ,i 



f(x.(o)) ,0)-ff(x,0)dG(x) |j< 

j n 3-1 3 j 



Since, similar as in the proof of theorem 2.2.3, it can be shown that the null 
set can be chosen independent of e , it follows now that 
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n^j=l |f (x,e )dG(x) a.s. pseudo-uniformly on . 



(2.3.15) 



Step 5 ; 



From (2.3.8) it follows that |f(x,0)dG(x) is a continuous function on . 
Using theorem 2.3.1, it is now easy to verify that sup |— f(x.,0)- f(x,6 

0€0 "" J 



)dG(x) I 



is a random variable, so that (2.3.15) becomes 

90 )^ |f(x,0)dG(x) I a.s. uniformly on 0 . 

This completes the proof. o 



If condition (2.3.3) is not satisfied then we can no longer apply theorem 
2.2.17 for proving (2.3.6) and (2.3.7). But applying theorem 2.2.16 we see that 
(2.3.6) and (2.3.7) still hold in probability. From theorem 2.2.4 it then 
follows that any subsequence (n^^) of (n) contains further subsequences 

(nj^^^) and (n^^^), say such that 
m m 






1 Km r 

— sup f(x.,0)^ sup f(x,0)dG(x) a.s. as m->“, 

\ 0€r^ ^ J 0€r, 

mo 6 

n(2) 

1 k r 

— inf f(x. 90 )->' inf f(x,0)dG(x) a.s. as m 

Apr ^ J RPF 



k 061. 

mo 0 

But (nj^^^) may also be considered as a subsequence of(nj^^^, hence without loss 
m 

of generality we may assume that these further subsequences are equal: 

^ - ^(1) . „(2) 

k k k 

mm m 

We now conclude from the argument in the proof of theorem 2.3.3 that: 



1 r 

sup |— y ._™ f(x. ,e )- f(x,0)dG(x) I -»■ 0 a.s. as m . 

9e(S)"k J 

m 

Again using theorem 2.2.4 we then conclude: 



Theorem 2.3.4 . Let the conditions of theorem 2.3.3 be satisfied, Cixcept 
(2.3.3). Then 



■“Ij-j^i(2£j 90 ) f(x,0)dG(x) in pr. uniformly on 0. 
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Functions ^ as considered in theorem 2.3.3 will be frequently used in this 
study. Therefore a more compact notation is convenient [Compare definition 
2 . 2.6 ]. 

Definition 2.3.4 . Let (Kx^e) be a real function on X x ^) , where X is a 
Euclidean space. Then by ^(x,0)^^we denote: 

= ’l'(|x|), 

where i|^(a)= sup sup |<|>(x,0)|, a^O . 

{|x|<a} 06 <S) 



The reader is invited to verify the following easy but useful (in)equalities . 

® ® 

{<|>(x,0)P} = {(|)(x,0) }P f or p > 0 , (2.3.16) 

(j)^(x,0)(j)2(x,0) ® {<|)^(x, 0)® }{<|) 2 (x, 0 ) ® } , (2.3.17) 

(|)^(x,0)+ (|>2(x,0) ® <_ (|>^(x,0) ® + (J) 2 (x ,0 ) ® . (2.3.18) 

Moreover , it follows from lemma 2.2.1 and theorem 2.3.1 that if cj)( x, 0) is continuous 
on X X ® , where X is an Euclidean space and ® a compact subset of an Euclidean 
space, then (|)(x 50 )® is a continuous function of |x| . 



2.4 Characteristic functions, stable dist-nihntdons and a central limit theorem 

Characteristic functions and central limit theorems are standard tools in 
mathematical statistics and hence in econometrics. Therefore we only summarize 
the for our purpose important results, and for the proofs we refer to text- 
books like Feller (1966), Chung (1974), or Wilks (1963). 

k . 

Definition 2.4.1 : Let x he a random vector in R with distribution 

function F. The characteristic function of this distribution is the 

k 

following complex valued function (f)(t) on R : 

(j)(t)= Ee^ ^ = E cos(t*2^) + iE sin(t’x^ • 

Distributions are fully determined by their characteristic functions: dis- 
tributions are equal if and only if their characteristic functions are equal. 
Moreover, convergence in distribution is closely related with convergence of 
characteristic functions : 




